Finding Correlated CCC-Biclusters from Gene Expression Data

ثبت نشده
چکیده

Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of experimental conditions, where the genes exhibit highly correlated behaviors. These correlated behaviors correspond to coherent expression patterns and can be used to identify potential regulatory modules possibly involved in regulatory mechanisms. Many specific versions of the biclustering problem have been shown to be (Non-deterministic polynomial) NP-complete. However, identifying biclusters in time series expression data, it can restrict the problem by finding only maximal biclusters with contiguous columns. This restriction leads to a tractable problem. The motivation of the biological processes start and finish in an identifiable contiguous period of time, leading to increased (or decreased) activity of sets of genes forming biclusters with contiguous columns. In this context, an algorithm that find and reports all maximal contiguous column coherent biclusters. (CCC-Biclusters), in time linear in the size of the expression matrix. Each relevant CCC-Bicluster identified corresponds to the discovery of a coherent expression pattern shared by a group of genes in a contiguous subset of time-points and identifies a potentially relevant regulatory module. The linear time complexity of CCC-Biclustering is obtained by manipulating a discretized version of the gene expression matrix and using efficient string processing techniques based on suffix trees. The results of the proposed algorithm in synthetic and real data that show the effectiveness of the approach and the relevance of CCC-Biclustering in the discovery of regulatory modules. These results were obtained by applying the algorithm to the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. The results show not only the ability of the proposed methodology to extract relevant information compatible with documented biological knowledge, but also the utility of using this algorithm in the study of other environmental stresses, and of regulatory modules, in general. Geethamani.S et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.4, April2014, pg. 1019-1034 © 2014, IJCSMC All Rights Reserved 1020

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ccc-bicluster Analysis for Time Series Gene Expression Data

Many of the biclustering problems have been shown to be NP-complete. However, when they are interested in identify biclusters in time series expression data, it can limit the problem by finding only maximal biclusters with contiguous columns. This restriction leads to a well-mannered problem. Its motivation is the fact that biological processes start and conclude in an identifiable contiguous p...

متن کامل

Efficient Biclustering Algorithms for Identifying Transcriptional Regulation Relationships Using Time Series Gene Expression Data

Biclustering algorithms have shown to be remarkably effective in a variety of applications. Although the biclustering problem is known to be NP-complete, in the particular case of time series gene expression data analysis, efficient and complete biclustering algorithms, are known and have been used to identify biologically relevant expression patterns. However, these algorithms, namely CCC-Bicl...

متن کامل

e-CCC-Biclustering: Related work on biclustering algorithms for time series gene expression data

This document provides supplementary material describing related work on biclustering algorithms for time series gene expression data analysis. We describe in detail three state of the art biclustering approaches specifically design to discover biclusters in gene expression time series and identify their strengths and weaknesses.

متن کامل

UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data.

Biclustering algorithms, which aim to provide an effective and efficient way to analyze gene expression data by finding a group of genes with trend-preserving expression patterns under certain conditions, have been widely developed since Morgan et al. pioneered a work about partitioning a data matrix into submatrices with approximately constant values. However, the identification of general tre...

متن کامل

Evaluation of Plaid Models in Biclustering of Gene Expression Data

Background. Biclustering algorithms for the analysis of high-dimensional gene expression data were proposed. Among them, the plaid model is arguably one of the most flexible biclustering models up to now. Objective. The main goal of this study is to provide an evaluation of plaid models. To that end, we will investigate this model on both simulation data and real gene expression datasets. Metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014